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MOST  RECENT  GOVERNMENT  REVIEW: 

6  August  2010  -  at  University  of  Virginia,  One-day  technical  review  attended  by  the  review  team  of  Robert 
Herklotz  (AFOSR). 

PROGRAM  OBJECTIVE:  The  objective  of  this  Self  Regenerative  Incorruptible  Enterprise  topic  is  to  develop 
technologies  that  will  enable  information  systems  to  learn,  regenerate  themselves  in  response  to  unforeseen  errors 
and/or  attacks,  and  automatically  improve  their  ability  to  deliver  critical  services.  If  successful,  self-regenerative 
systems  will  reconstitute  the  information  systems  back  to  its  initial  operating  capability  while  decreasing  their 
vulnerability  to  an  ever-increasing  number  of  attacks.  In  this  project  the  team  is  developing  autonomic  recovery  and 
reconstitution  techniques,  which  together  it  calls  self-regeneration,  for  enterprise  systems  —  clients  and  servers  —  for 
both  incidental  failures  and  deliberate  attack.  Enterprise  computing  systems  need  to  be  not  only  highly  available,  but 
also  highly  resistant  to  attack.  Enterprise  computing  systems  will  fail  due  to  software  flaw's,  attacks,  or  hardware 
device  failures.  As  a  result,  it  is  not  adequate  to  assume  that  these  systems  will  be  able  to  always  provide  critical 
network  services  without  building  mechanisms  that  account  for  and  handle  these  failures  in  service. 

MURI  CONSORTIUM  RESEARCH  TEAM  MEMBERS: 

•  Anup  Ghosh,  PI , George  Mason  University 

•  Sushi!  Jajodia,  George  Mason  University 

•  Angelos  Keromytrs,  Columbia  University 

•  Salvatore  Stolfo,  Columbia  University 

•  Jason  Nieh,  Columbia  University 

•  Peng  Liu,  Pennsylvania  State  University 


SCIENTIFIC  APPROACH: 

In  this  project,  the  team  is  developing  autonomic  recovery  and  regeneration  mechanisms  that  will  enable  commodity 
systems  to  detect  attacks,  corruptions,  and  failures,  then  seif  regenerate  to  a  known  good  state,  for  both  program  and 
data,  while  increasing  the  reliability  and  security  of  the  software  to  be  more  resistant  and  less  vulnerable  to  attack. 
The  techniques  allow  for  imperfect  software  systems  to  be  deployed,  but  by  providing  autonomic  recovery  and 
regeneration,  these  systems  will  remain  highly  available  and  adaptive  such  that  they  will  recover  and  be  more 
resistant  after  attack.  In  this  multi-disciplinary  and  multi-university  team,  they  are  developing  a  loosely  coupled, 
layered  defense  system  approach  to  system  self-regeneration  in  two  parts:  whole  system  regeneration  and 
application-based  regeneration.  The  whole  system  approach  uses  machine  virtualization  in  a  transaction-based 
model  to  roll  operating  systems  back  to  the  last  known  good  state  prior  to  corruption  or  attack.  The  key  innovation 
here  is  using  a  transaction-based  model  for  commodity  operating  systems  to  enable  consistent  recovery  and  forward 
recovery  with  correction.  The  team  is  also  developing  application-specific  self-regeneration  techniques  that  employ 
verified  error  virtualization  to  enable  a  service  to  continue  running  even  in  the  presence  of  faults  and  attacks  that 
would  otherwise  crash  the  server  or  provide  unauthorized  privileged  access.  In  deployment,  the  application-specific 
techniques  will  provide  fine  grained  and  frequent  micro-recovery  actions  that  keep  servers  functioning  through 
software  errors  and  attacks,  while  dynamically  patching  vulnerable  software  functions  to  render  running  software 
more  resilient  after  attack  or  failure. 

For  system  self-healing ,  the  research  aims  to  provide  an  isolation  and  application  interaction-tracing  framework.  In 


this  system,  all  application  interactions  are  recorded  as  transactions.  The  aim  is  to  provide  a  recovery  system  that 
supports  online  recovery  upon  detection  of  a  corruption  without  prohibitive  performance  and  storage  requirements. 
By  recording  application  activities  as  container-based  transactions,  one  trades  the  capability  for  taint  tracking  and 
replay  within  a  container  for  lower  processing  and  storage  cost.  The  team  has  implemented  a  prototype,  termed 
Journaling  Computing  System  (JCS)  using  OpenVZ,  a  lightweight  process  virtualization  mechanism. 

With  respect  to  recovery  of  database  systems  after  attack,  the  team  applies  state  machine  theory,  as  long  as  the  state 
transitions  of  an  information  system  can  be  properly  logged,  damage  assessment  can  be  done  via  state  classification 
(and  dependence  analysis),  and  recovery  can  be  done  via  state  roll-back/roll-forward  operations.  This  work  includes 
transactional  database  intrusion  recovery  to  guarantee  atomicity  and  consistency  and  development  of  a  self-healing 
database  management  system  (DBMS)  to  self-recover  from  malicious  data  corruptions  without  suffering  from  any 
downtime.  This  approach  includes  dynamic  quarantine,  fine-grained  dependency  analysis,  and  multi-version  based 
online  repairs. 

With  respect  to  application  self-healing,  the  approach  is  based  on  the  concept  of  software  elasticity.  Briefly,  this 
refers  to  the  ability  of  software  to  recover  from  faults  (accidental  or  purposeful)  under  the  right  conditions,  by 
translating  exceptions  caused  by  unforeseen  failures  into  (possibly  unrelated)  error  conditions  that  are  handled  by 
the  software.  The  approach  exploits  this  property  by  enabling  software  self-healing  against  a  broad  range  of  failures 
and  attacks  that  allow  legacy  code  to  become  elastic.  In  order  to  achieve  this,  the  team  leveraged  mechanisms  from  a 
multitude  of  Computer  Science  disciplines,  including  programming  languages,  operating  systems,  machine  learning, 
and  graph  theory.  The  primary  challenges  in  this  scheme  revolve  around  balancing  the  3-way  tradeoff  between: 

-  effectiveness  of  the  self-healing  mechanism  (range  of  faults/attacks  caught  and  mitigated) 
performance  impact  on  the  hardened  software  system 
safety  of  the  recovery/self-healing  operation 

To  that  end,  the  team  developed  a  series  of  novel  mechanisms  and  algorithms  that  demonstrate  the  notion  of  “error 
virtualization  using .rescue  points”.  In  the  work  to  date,  the  team  has  experimentally  evaluated  the  effectiveness  of 
theii  technique  against  real  vulnerabilities  and  bugs,  as  well  as  against  a  much  larger  set  of  synthetic  vulnerabilities; 
they  have  also  characterized  the  overhead  performance  of  the  system  showing  it  to  be  very  low.  The  team  has 
demonstrated  the  feasibility  of  creating  completely  automated  self-healing  software  by  building  a  fully  integrated 
architecture  and  system,  named  ASSURE.  This  work  has  appeared  in  some  of  the  top  Security  and  svstems 
conferences  (ASPLOS,  USENIX  Technical,  IEEE  Security  &  Privacy,  NDSS,  ACM-OCS). 

MAJOR  ACCOMPLISHMENTS  TO-DATE: 

The  following  major  accomplishments  have  been  accomplished  to  date: 

•  GMU  developed  a  Moving  Target  Resilient  Web  Services  framework.  GMU  is  taking  a  proactive 
approach  to  defense  by  constan  tly  shifting  the  attack  surface  of  the  web  service  in  order  to  introduce 
uncertainty  to  the  adversary.  In  particular,  we  leverage  virtualization  technologies  to  create  virtual  servers 
(VSs),  each  configured  with  a  different  software  mix,  producing  diversified  attack  services  that  are  rotated 
with  a  short  time  constant.  Assuming  N  different  VSs,  M  <  N  will  serve  online  requests  at  a  time  while 
offline  servers  are  reverted  to  pre-defmed  pristine  state.  By  constantly  changing  the  set  of  M  online  servers 
and  introducing  randomness  in  their  selection,  adversaries  of  a  web  service  will  have  to  face  multiple, 
unpredictable  attack  surfaces.  While  one  cannot  completely  rule  out  their  exploits,  successful  attacks' 
against  the  web  service  will  be  more  difficult  and  temporary  as  they  get  reverted  to  their  pristine  state.  As  a 
result,  Web  services  will  be  significantly  more  resilient  to  attacks. 

Several  challenging  issues  must  still  be  addressed  to  materialize  this  vision.  First,  managing  the 
complexity  of  many  different  software  stacks  is  not  straightforward,  but  we  are  developing  techniques  to 
address  the  technology  management  issues.  We  are  developing  a  RESTful  implementation  of  Web  services 
that  facilitate  the  construction  of  non-persistent  system  without  disrupting  web  services.  We  will  also 
consider  the  merits  of  a  large  number  of  diversified  attack  surfaces  versus  small  and  highly  hardened  ones. 

It  is  our  belief  that  today’s  sophisticated  web  applications  already  create  large  attack  surfaces.  Making 
them  unpredictable  while  constantly  shifting  their  attack  surface  will  raise  the  resilience  of  web  services 
against  intrusion.  To  prove  our  points  in  future,  we  will  address  the  need  of  new  metrics  to  measure  service 
resilience. 

*  GMU  implemented  a  secure  desktop  environment  using  lightweight  virtualization  and  the  Journaling 
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Computing  System  with  recovery  of  tainted  files.  The  GMU  team  designed  a  Linux-based  secure  desktop, 
called  SecTop,  that  instantiates  a  lightweight  virtual  environment  for  every  application  in  a  pre-configured 
clean  state.  Using  the  JCS,  application  activities  are  recorded  as  state-based  transactions.  Performance 
evaluation  studies  demonstrated  that  acceptable  overhead  in  processing  and  storage  costs  for  desktop 
systems,  while  providing  protection  and  recovery  after  attack.  We  recently  coupled  this  desktop  with  a 
recovery  system  that  is  able  to  ascertain  which  fries  are  corrupted,  and  selectively  restore  them  back  to  their 
pre-infection  state  without  loss  of  untainted  data 

•  GMU  developed  a  novel  Hypervisor  Integrity  Monitoring  technique.  In  order  to  assure  that  virtualized 
systems  are  not  compromised  themselves,  additional  protection  for  the  integrity  of  the  operating  system  and 
the  virtualization  execution  environments  is  needed.  To  that  end,  we  developed  HyperCheck,  a  hardware- 
assisted  tampering  detection  framework  designed  to  protect  the  integrity  of  VMMs  and,  for  some  classes  of 
attacks,  the  underlying  operating  system  (OS).  HyperCheck  leverages  the  CPU  System  Management  Mode 
(SMM),  present  in  x86  systems,  to  securely  generate  and  transmit  the  full  state  of  the  protected  machine  to 
an  external  server.  Using  HyperCheck,  we  were  able  to  ferret-out  rootkits  that  targeted  the  integrity  of  both 
the  Xen  hypervisor  and  traditional  OSes.  Moreover,  HyperCheck  is  ro-  bust  against  attacks  that  aim  to 
disable  or  block  its  operation.  Our  experimental  results  show  that  HyperCheck  can  produce  and 
communicate  a  scan  of  the  state  of  the  protected  software  in  less  than  40ms. 

•  GMU  and  Columbia  developed  a  collaborative  exchange  network  for  network  anomalies.  We  have 
demonstrated  the  collaborative  deployment  of  network  Anomaly  Detection  (AD)  sensors.  Our  system 
examines  the  ingress  http  traffic  and  correlates  AD  alerts  from  two  administratively  disjoint  domains: 
Columbia  University  and  George  Mason  University.  We  have  shown  that,  by  exchanging  packet  content 
alerts  between  the  two  sites,  we  can  achieve  zero-day  attack  detection  to  inform  local  server  instrumented 
systems  that  an  attack  has  occurred  with  high  confidence. 

•  GMU  developed  a  scalable  distributed  data  structure  with  recoverable  encryption  for  cloud-based  services. 
LH*RE  is  a  new  Scalable  Distributed  Data  Structure  (SDDS)  for  hash  files  stored  in  a  cloud.  The  client- 
side  symmetric  encryption  protects  the  data  against  the  server-side  disclosure.  The  encryption  key(s)  at  the 
client  are  backed  up  in  the  file.  The  client  may  recover/  revoke  any  keys  lost  or  stolen  from  its  node.  A 
trusted  official  can  also  do  it  on  behalf  of  the  client  of  of  an  authority,  e.g.,  to  imperatively  access  the  data 
of  a  client  missing  or  disabled.  In  contrast,  with  high  assurance,  e.g.,  99%,  the  attacker  of  the  cloud  should 
not  usually  disclose  any  data,  even  if  the  intrusion  succeeds  oyer  dozens  or  possibly  thousands  of  servers 
for  a  larger  file.  Storage  and  primary  key-based  access  performance  of  LH*RE  should  be  about  those  of  the 
well-known  LH*  SDDS.  Two  messages  should  typically  suffice  for  a  key-based  search  and  four  in  the 
worst  case,  with  the  application  data  load  factor  of  70%,  regardless  of  the  file  scale  up.  These  features  are 
among  most  efficient  for  a  hash  SDDS.  LH*RE  should  be  attractive  with  respect  to  the  competition. 

•  Columbia  University  developed  a  lightweight  virtual  layered  file  systems  for  isolating  untrusted 
applications  from  each  other  called  Apiary.  Desktop  computers  run  many  different  applications,  the 
compromise  of  any  one  of  Which  can  compromise  the  entire  desktop  given  the  lack  of  isolation  among 
applications.  Recovering  a  compromised  desktop  remains  a  time  consuming  task,  which  typically  requires 
wiping  everything  and  reinstalling  the  system  from  scratch.  These  security  issues  pose  fundamental 
challenges  as  desktop  computers  are  relied  on  for  everything  from  financial  transactions  to  medical  records. 
To  address  these  problems,  we  have  created  novel  virtual  layered  file  system  (VLFS)  technologies  to 
improve  system  security.  Unlike  a  traditional  file  system  which  is  a  monolithic  entity,  a  VLFS  dynamically 
composes  together  a  set  of  software  layers  into  a  single  file  system  view  for  a  desktop.  Changes  to  one 
layer  are  isolated  and  decoupled  from  changes  to  another.  The  VLFS  dynamic  composition  feature  enables 
powerful  and  easy-to-use  security  functionality.  We  are  using  VLFSes  to  build  an  architecture  to  enable 
security  patches  to  be  deployed  effectively  when  managing  large  numbers  of  heterogeneously  configured 
machines,  and  to  speed  system  recovery  from  security  exploits.  We  have  also  used  VLFSes  to  develop  a 
transparent  desktop  application  fault  containment  architecture  that  is  effective  at  limiting  the  damage  from 
exploits  to  enable  quick  recovery  while  being  as  easy  to  use  as  a  traditional  desktop  system. 

•  PSU  developed  PEDA:  Cross-Laver  Comprehensive  Damage  Assessment  for  Production  Workload  Server 
Systems.  Damage  assessment  (or  intrusion  harm  analysis)  is  a  critical  step  in  intrusion  recovery  of 
enterprise  systems.  However,  for  production  workload  server  systems,  the  state  of  the  art  technologies  can 
only  do  coarse-grained  damage  assessment  at  the  system  call  level.  (Coarse-grained  damage  assessment  is 
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quite  limited.  In  many  cases,  system  call  level  damage  assessment  cannot  identify  all  the  damage  (e.g.,  data 
corruption)  that  has  been  caused  by  an  intrusion. )  Doing  fine-grained  damage  assessment  at  the  instruction 
level  is  theoretically  possible,  but  no  existing  fine-grained  damage  assessment  technique  is  really  practical 
for  production  workload  servers:  they  usually  cause  15-100x  binary  instrumentation  overhead. 

PEDA  (Production  Environment  Damage  Assessment),  to  the  best  of  our  knowledge,  is  the  first  practical 
fine-grained  damage  assessment  technology  for  production  workload  server  systems.  Analyzing  the  harm 
of  intrusion  to  enterprise  servers  is  an  onerous  and  error-prone  work.  Though  dynamic  taint  tracking 
enables  automatic  fine-grained  intrusion  harm  analysis  for  enterprise  servers,  the  significant  runtime 
overhead  introduced  is  intolerable  in  the  production  workload  environment.  Our  proposed  system  PEDA 
decouples  the  onerous  analysis  work  from  the  online  execution  of  the  production  servers  and  analyzes  the 
“has-been-infected”  execution  during  high  fidelity  replay  on  a  separate  instrumentation  platform.  Through 
a  novel  heterogeneous  virtual  machine  migration  technique,  PEDA  allows  the  online  execution  of  servers 
to  run  atop  fast  hardware-assisted  virtual  machines  (Xen)  and  the  infected  execution  to  be  replayed  atop 
binary  instrumentation  virtual  machines  (Qemu)  for  intrusion  harm  analysis.  This  approach  can 
dramatically  reduce  the  runtime  overhead  of  online  execution.  In  order  to  provide  fine-grained  taint  seed 
to  decoupled  harm/damage  analysis,  PEDA  bridges  the  gap  between  backward  system  call  dependency 
tracking  and  forward  intrusion  taint  analysis  by  integrating  a  one-step-forward  auditing  approach. 

Evaluation  demonstrates  the  efficiency  of  PEDA  system  with  runtime  overhead  as  low  as  5%,  and  the  real- 
life  intrusion  studies  successfully  show  the  comprehensiveness  and  the  precision  of  PEDA  damage 
assessment. 
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